Forward masking for increased robustness in automatic speech recognition

نویسندگان

Sascha Wendt

Gernot A. Fink

Franz Kummert

چکیده

In automatic speech recognition mel-frequency cepstral coefficients (MFCC) or linear predictive cepstral coefficients (LPCC) are features commonly used today. However, their calculation considers only a few features of the auditory system. On the assumption that the human representation of speech is an optimal representation, considering more features of the auditory system might lead to a better performance of automatic speech recognition systems. In this paper a model proposed by Strope and Alwan [1], which relies on the human acoustic perception and allows to consider the effect of forward masking, is incorporated after some modifications into an automatic speech recognition system with a MFCC-based front-end. The extended system is evaluated on recognition tasks, that are closer to real recognition than (connected) digit recognition commonly used in the literature. The evaluations show an increased robustness of the speech recognition system with forward masking on all recognition tasks, but especially on data recorded in noisy environments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A model of dynamic auditory perception and its application to robust word recognition

This paper describes two mechanisms that augment the common automatic speech recognition (ASR) front end and provide adaptation and isolation of local spectral peaks. A dynamic model consisting of a linear filterbank with a novel additive logarithmic adaptation stage after each filter output is proposed. An extensive series of perceptual forward masking experiments, together with previously rep...

متن کامل

Forward masking on a generalized logarithmic scale for robust speech recognition

This paper examines the forward masking on the generalized logarithmic scale for robust speech recognition to both additive and convolutional noise. The forward masking in the dynamic cepstral (DyC) representation is based upon subtraction of a masking pattern from a current spectrum on a logarithmic spectral domain, whereas the proposed method intends to make a compromise between the logarithm...

متن کامل

An improved model of masking effects for robust speech recognition system

Performance of an automatic speech recognition system drops dramatically in the presence of background noise unlike the human auditory system which is more adept at noisy speech recognition. This paper proposes a novel auditory modeling algorithm which is integrated into the feature extraction front-end for Hidden Markov Model (HMM). The proposed algorithm is named LTFC which simulates properti...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Improved Forward Masking on a Generalized Logarithmic Scale for Robust Speech Recognition

We previously proposed a forward masking on a generalized logarithmic scale to eliminate convolutional noise as well as to suppress additive noise. While the generalized Dynamic Cepstrum derived from the masked spectrum has been robust to both noises, the robustness to convolutional noise slightly degrades as compared to masking on the logarithmic scale, and the optimal masking coefficient depe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Forward masking for increased robustness in automatic speech recognition

نویسندگان

چکیده

منابع مشابه

A model of dynamic auditory perception and its application to robust word recognition

Forward masking on a generalized logarithmic scale for robust speech recognition

An improved model of masking effects for robust speech recognition system

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Improved Forward Masking on a Generalized Logarithmic Scale for Robust Speech Recognition

عنوان ژورنال:

اشتراک گذاری